2025.7.13 決定木

まずは使ってみよう。

code:py.py

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

X, y = load_iris(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y)

model = DecisionTreeClassifier() # モデルインスタンスの生成

model.fit(X_train, y_train) # 訓練

print('# Predict:', model.predict(X_test)) # テスト

print('# Accuracy:', model.score(X_test, y_test)) # 正解率

'''

# Predict: 0 1 2 1 0 2 1 2 1 0 2 2 2 1 0 2 1 0 0 1 2 1 1 2 0 1 2 1 2 0 1 2 0 0 0 1 1 0

# Accuracy: 0.9736842105263158

'''

この場合、random_state を設定していないので、実行する度に結果は変わる。

scoreは正解率（accuracy）を返す。

predict_proba メソッドは確率を返す。predict メソッドはこのうちで最も確率の高いクラスを選択することに対応している。

code:p.py

model.predict_proba(X_test)

'''

array([1., 0., 0.,

...

'''

可視化

plot_tree を用いて決定木を可視化する。

code:p1.py

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier, plot_tree

import matplotlib.pyplot as plt

X, y = load_iris(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y)

model = DecisionTreeClassifier()

model.fit(X_train, y_train)

plot_tree(model)

plt.show()

以下のように装飾してみよう。

グラフサイズを大きくする ... figsize

特徴量の名前を表示する ... feature_names

分類されたクラス名を表示する ... class_names

推定結果に応じてノードに色を塗る ... filledで指定

code:p.py

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier, plot_tree

import matplotlib.pyplot as plt

dataset = load_iris()

X, y = dataset.data, dataset.target

X_train, X_test, y_train, y_test = train_test_split(X, y)

model = DecisionTreeClassifier()

model.fit(X_train, y_train)

plt.figure(figsize=(20, 20))

plot_tree(

model,

feature_names = dataset.feature_names,

class_names = dataset.target_names,

filled =True

)

plt.show()

https://scrapbox.io/files/6873a3b64e86f7813b94f52b.png

sklearn.tree.plot_tree(decision_tree, *, max_depth=None, feature_names=None, class_names=None, label='all', filled=False, impurity=True, node_ids=False, proportion=False, rounded=False, precision=3, ax=None, fontsize=None)source

decision_tree

プロットする決定木

max_depth ... int, default=None

The maximum depth of the representation. If None, the tree is fully generated.

feature_names ... array-like of str, default=None

特徴量の名前、If None, generic names will be used (“x0”, “x1”, …).

class_names ... array-like of str or True, default=None

正解ラベルの名前、Names of each of the target classes in ascending numerical order. Only relevant for classification and not supported for multi-output. If True, shows a symbolic representation of the class name.

label ... {‘all’, ‘root’, ‘none’}, default=’all’

不純物などの情報ラベルを表示するかどうか。オプションには、すべてのノードに表示する「すべて」、最上位のルートノードにのみ表示する「ルート」、どのノードにも表示しない「なし」があります。

filled ... bool, default=False

True に設定すると、分類の場合は大多数のクラス、回帰の場合は値の極値、マルチ出力の場合はノードの純度を示すためにノードをペイントします。

impurity ... bool, default=True

When set to True, show the impurity at each node.

node_ids ... bool, default=False

When set to True, show the ID number on each node.

proportion ... bool, default=False

When set to True, change the display of ‘values’ and/or ‘samples’ to be proportions and percentages respectively.

rounded ... bool, default=False

When set to True, draw node boxes with rounded corners and use Helvetica fonts instead of Times-Roman.

precision ... int, default=3

Number of digits of precision for floating point in the values of impurity, threshold and value attributes of each node.

ax ... matplotlib axis, default=None

Axes to plot to. If None, use current axis. Any previous content is cleared.

fontsize ... int, default=None

Size of text font. If None, determined automatically to fit figure.

ジニ不純度

load_iris【sklearn】